Animating your ggplots may sound daunting. However, you have to add a line or two extra of code and you have an animation! gganimate makes animation quite accessible for users of ggplot.

A cheat sheet for what we’ll cover today:


Let’s load back up our data from the previous lessons on R by Adriana Picoral (picoral.github.io/resbaz_intro_to_r/parti.html) and from Kathryn Busby on ggplot2. I’ll name the dataframe avocado because I can’t remember what the other instructors named their data. We will also load our packages here.

library(tidyverse)
# install.packages("gganimate")
library(gganimate)
# install.packages("scales")
library(scales)
avocado <- read_csv("avocado.csv")

Avocado data is originally from www.kaggle.com/neuromusic/avocado-prices/data and included here to make download easier.

Let’s explore our data a little bit..

glimpse(avocado)
## Rows: 18,249
## Columns: 14
## $ X1             <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15...
## $ Date           <date> 2015-12-27, 2015-12-20, 2015-12-13, 2015-12-06, 201...
## $ AveragePrice   <dbl> 1.33, 1.35, 0.93, 1.08, 1.28, 1.26, 0.99, 0.98, 1.02...
## $ `Total Volume` <dbl> 64236.62, 54876.98, 118220.22, 78992.15, 51039.60, 5...
## $ `4046`         <dbl> 1036.74, 674.28, 794.70, 1132.00, 941.48, 1184.27, 1...
## $ `4225`         <dbl> 54454.85, 44638.81, 109149.67, 71976.41, 43838.39, 4...
## $ `4770`         <dbl> 48.16, 58.33, 130.50, 72.58, 75.78, 43.61, 93.26, 80...
## $ `Total Bags`   <dbl> 8696.87, 9505.56, 8145.35, 5811.16, 6183.95, 6683.91...
## $ `Small Bags`   <dbl> 8603.62, 9408.07, 8042.21, 5677.40, 5986.26, 6556.47...
## $ `Large Bags`   <dbl> 93.25, 97.49, 103.14, 133.76, 197.69, 127.44, 122.05...
## $ `XLarge Bags`  <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00...
## $ type           <chr> "conventional", "conventional", "conventional", "con...
## $ year           <dbl> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015...
## $ region         <chr> "Albany", "Albany", "Albany", "Albany", "Albany", "A...
summary(avocado)
##        X1             Date             AveragePrice    Total Volume     
##  Min.   : 0.00   Min.   :2015-01-04   Min.   :0.440   Min.   :      85  
##  1st Qu.:10.00   1st Qu.:2015-10-25   1st Qu.:1.100   1st Qu.:   10839  
##  Median :24.00   Median :2016-08-14   Median :1.370   Median :  107377  
##  Mean   :24.23   Mean   :2016-08-13   Mean   :1.406   Mean   :  850644  
##  3rd Qu.:38.00   3rd Qu.:2017-06-04   3rd Qu.:1.660   3rd Qu.:  432962  
##  Max.   :52.00   Max.   :2018-03-25   Max.   :3.250   Max.   :62505647  
##       4046               4225               4770           Total Bags      
##  Min.   :       0   Min.   :       0   Min.   :      0   Min.   :       0  
##  1st Qu.:     854   1st Qu.:    3009   1st Qu.:      0   1st Qu.:    5089  
##  Median :    8645   Median :   29061   Median :    185   Median :   39744  
##  Mean   :  293008   Mean   :  295155   Mean   :  22840   Mean   :  239639  
##  3rd Qu.:  111020   3rd Qu.:  150207   3rd Qu.:   6243   3rd Qu.:  110783  
##  Max.   :22743616   Max.   :20470573   Max.   :2546439   Max.   :19373134  
##    Small Bags         Large Bags       XLarge Bags           type          
##  Min.   :       0   Min.   :      0   Min.   :     0.0   Length:18249      
##  1st Qu.:    2849   1st Qu.:    127   1st Qu.:     0.0   Class :character  
##  Median :   26363   Median :   2648   Median :     0.0   Mode  :character  
##  Mean   :  182195   Mean   :  54338   Mean   :  3106.4                     
##  3rd Qu.:   83338   3rd Qu.:  22029   3rd Qu.:   132.5                     
##  Max.   :13384587   Max.   :5719097   Max.   :551693.7                     
##       year         region         
##  Min.   :2015   Length:18249      
##  1st Qu.:2015   Class :character  
##  Median :2016   Mode  :character  
##  Mean   :2016                     
##  3rd Qu.:2017                     
##  Max.   :2018
class(avocado$Date) #make sure `Date` is actually a date type 
## [1] "Date"
unique(avocado$region)# what type of regions are included here?
##  [1] "Albany"              "Atlanta"             "BaltimoreWashington"
##  [4] "Boise"               "Boston"              "BuffaloRochester"   
##  [7] "California"          "Charlotte"           "Chicago"            
## [10] "CincinnatiDayton"    "Columbus"            "DallasFtWorth"      
## [13] "Denver"              "Detroit"             "GrandRapids"        
## [16] "GreatLakes"          "HarrisburgScranton"  "HartfordSpringfield"
## [19] "Houston"             "Indianapolis"        "Jacksonville"       
## [22] "LasVegas"            "LosAngeles"          "Louisville"         
## [25] "MiamiFtLauderdale"   "Midsouth"            "Nashville"          
## [28] "NewOrleansMobile"    "NewYork"             "Northeast"          
## [31] "NorthernNewEngland"  "Orlando"             "Philadelphia"       
## [34] "PhoenixTucson"       "Pittsburgh"          "Plains"             
## [37] "Portland"            "RaleighGreensboro"   "RichmondNorfolk"    
## [40] "Roanoke"             "Sacramento"          "SanDiego"           
## [43] "SanFrancisco"        "Seattle"             "SouthCarolina"      
## [46] "SouthCentral"        "Southeast"           "Spokane"            
## [49] "StLouis"             "Syracuse"            "Tampa"              
## [52] "TotalUS"             "West"                "WestTexNewMexico"

You’ll notice that our region variable is kind of all over the place. Because I’ve reviewed this before, I know we need to separate out the US level, states, regions, and cities so our graphs are on the same level.

avocado_us <- avocado %>% filter(region == "TotalUS")

states <- c("California")
avocado_CA <- avocado %>% filter(region %in% states)

regions <- c("West","Southeast","SouthCentral","Plains","Northeast","Midsouth","GreatLakes","WestTexNewMexico","NorthernNewEngland")
avocado_region <- avocado %>% filter(region %in% regions)

avocado_cities <- avocado %>% filter(!region %in% c("TotalUS", states, regions))

We’re finally ready to make some plots, and then build the animation into these plots.

transition_reveal()

This type of transition is the simplest and acts like a piece of paper is being removed from left to right over the top of the graph to slowly reveal the result. That’s how I think about it, at least. This assume that your x axis is also what is included inside your statement transition_reveal().

For this, let’s first build a static line plot that has date on the x-axis. Looking through the data, we could use AveragePrice or Total Volume on the y axis, and we could disaggregate by region, size of avocado, or type (organic versus conventional).

Let’s stick to the totalUS aggregation dataset we made (avocado_us) and look at the average price of conventional and organic avocados over time.

ggplot(data = avocado_us, 
       mapping = aes(x = Date, y = AveragePrice, color = type)) +
  geom_line()

If we feel good on time, we can make a few adjustments to the plot before animating it.

ggplot(data = avocado_us, 
       mapping = aes(x = Date, y = AveragePrice, color = type)) +
  geom_line() +
  scale_y_continuous(labels = scales::dollar_format()) +  # format that y axis! 
  scale_color_manual(values= c("darkgreen", "darkolivegreen3")) +
  theme_minimal() +
  labs(title = "Average Price of US Avocados",
       caption = "Source: Kaggle")

This looks a lot better. I one what happened the summer of 2015! Now let’s animate this. The key to this animation is transition_reveal(). Inside of the function, we can write out x axis variable. While it will take a few moments to render, you should see an animated plot in your plots pane.

ggplot(data = avocado_us, 
       mapping = aes(x = Date, y = AveragePrice, color = type)) +
  geom_line() +
  scale_y_continuous(labels = scales::dollar_format()) +  # format that y axis! 
  scale_color_manual(values= c("darkgreen", "darkolivegreen3")) +
  theme_minimal() +
  labs(title = "Average Price of US Avocados",
       caption = "Source: Kaggle") +
  transition_reveal(Date)

Let’s also save this, since each time we run the code it takes some time.

anim_save(filename = "type_reveal.gif")

Challenge

Take a few minutes to try and plot the changes in total volume of organic avocados across time for the different regions of the USA.


transition_time()

Transition time creates new “layers” of the animation over a continous variable, usually time (i’ve never seen an exception to that). While this works best with geom_point, there’s many other options you can play around with.

Let’s use two continous variables to plot this. Let’s see how well price explains the volume sold of avocados for non-organic avocados (though, it’s been awhile since I took Econ101). Let’s do this for the different cities in the US, omitting states and regions.

avocado_cities_filtered <- avocado_cities %>% 
  filter(type == "conventional",
         Date > as.Date("2018-01-01"))

ggplot(data = avocado_cities_filtered,
       mapping = aes(x = AveragePrice, y = `Total Volume`, color = region)) +
  geom_point()

That legend is really going to get in the way. Let’s remove it and customize the circles before animating.

ggplot(data = avocado_cities_filtered,
       mapping = aes(x = AveragePrice, y = `Total Volume`, color = region)) +
  scale_y_continuous(labels = scales::comma_format()) +
  scale_x_continuous(labels = scales::dollar_format()) +
  geom_point(aes(size = `Total Volume`), alpha = .6) +  
  theme_minimal() +
  theme(legend.position = "none") +
  labs(title = "Avocados sold by price and city") 

In practice, the animation is basically layering a bunch of plots on top of each other, as if they were facet_wraps. When I’m planning out an animation, I often use facet_wrap like you learned this morning to see the different layers before I “assemble” them.

ggplot(data = avocado_cities_filtered,
       mapping = aes(x = AveragePrice, y = `Total Volume`, color = region)) +
  scale_y_continuous(labels = scales::comma_format()) +
  scale_x_continuous(labels = scales::dollar_format()) +
  geom_point(aes(size = `Total Volume`), alpha = .6) +  
  theme_minimal() +
  theme(legend.position = "none") +
  labs(title = "Avocados sold by price and city") +
  facet_wrap(~Date)

Now we can move on to animating this. transition_time() will replace the previous dot, making it hard to see any trends. Let’s add shadow_wake so we can see the direction between points.

One really cool trick I like to employ is writing in the subtitle what point in time we’re currently animating. Before it didn’t really matter because the date was on the x axis, but not its hidden. For that, we need to add some {} in the subtitle argument of labs.

ggplot(data = avocado_cities_filtered,
       mapping = aes(x = AveragePrice, y = `Total Volume`, color = region)) +
  scale_y_continuous(labels = scales::comma_format()) +
  scale_x_continuous(labels = scales::dollar_format()) +
  geom_point(aes(size = `Total Volume`), alpha = .6) +  
  theme_minimal() +
  theme(legend.position = "none") +
  labs(title = "Avocados sold by price and city",
       subtitle = "Date: {frame_time}") +
  transition_time(Date) +
  shadow_wake(wake_length = 0.2)    

Let’s also save this, since each time we run the code it takes some time.

anim_save(filename = "type_time.gif")

Challenge

Can you use transition_time to show how the price of organic avocados change over time for California?


transition_state()

Transition_state() creates a new animation layer across a categorical variable instead of over time.

avocado_region_long <- avocado_region %>% 
  pivot_longer(cols = c(`4046`,`4225`,`4770`),
               names_to = "size",
               values_to = "volume")

ggplot(data = avocado_region_long,
       mapping = aes(x = size, y = volume, color = size)) +
  geom_boxplot() 

Let’s customize this a little to make it look nicer.

ggplot(data = avocado_region_long,
       mapping = aes(x = size, y = volume, color = size)) +
  geom_boxplot() +
  theme_classic() +
  scale_y_continuous(labels = scales::comma_format()) +
  labs(title = "Boxplot of volume sold by Avocado Size")

It isn’t particularly helpful that the previous view completely dissappears as in transition_time. Instead of using shadow_wake(), let’s use shadow_mark() to the animated plot to keep the past views visible.

ggplot(data = avocado_region_long,
       mapping = aes(x = size, y = volume, color = size)) +
  geom_boxplot() +
  theme_classic() +
  scale_y_continuous(labels = scales::comma_format()) +
  labs(title = "Boxplot of volume sold by Avocado Size") +
  transition_states(size, state_length = 1, transition_length = 1) +
  shadow_mark(alpha = 0.3, size = 0.5)    

Let’s also save this, since each time we run the code it takes some time.

anim_save(filename = "type_state.gif")


Challenge answers:

Challenge 1: Take a few minutes to try and plot the changes in total volume across time for the different regions of the USA.

ggplot(data = filter(avocado_region, type == "organic"), 
       aes(x = Date, y = `Total Volume`, color = region)) +
  geom_line() +
  theme_minimal() +
  labs(title = "Average Price of US Avocados",
       caption = "Source: Kaggle",
       subtitle = "Date: {frame_along}") +
  transition_reveal(Date)

anim_save("challenge_1.gif")

Challenge 2: Can you use transition_time to show how the price of organic avocados change over time for California?

ggplot(data = filter(avocado_CA, type == "organic"),
       mapping = aes(x = Date, y = AveragePrice)) +
  scale_y_continuous(labels = scales::dollar_format()) +
  geom_point(alpha = .6) +  
  theme_minimal() +
  theme(legend.position = "none") +
  labs(title = "The fluctuating price of organic avocados in California",
       subtitle = "Date: {frame_time}") +
  transition_time(Date) +
  shadow_wake(wake_length = 0.2)    

anim_save("challenge_2.gif")